Search Result

Select

Microoperation-based parameter auto-optimization method of Hadoop

LI Yunshu, TENG Fei, LI Tianrui

Journal of Computer Applications 2019, 39 (6): 1589-1594. DOI: 10.11772/j.issn.1001-9081.2018122592

Abstract （387）

PDF （931KB）（250）

Save

As a large-scale distributed data processing framework, Hadoop has been widely used in industry during the past few years. Currently manual parameter optimization and experience-based parameter optimization are ineffective due to complex running process and large parameter space. In order to solve this problem, a method and an analytical framework for Hadoop parameter auto-optimization were proposed. Firstly, the operation process of a job was broken down into several microoperations and the microoperations were determined from the angle of finer granularity directly affected by variable parameters, so that the relationship between parameters and the execution time of a single microoperation was able to be analyzed. Then, by reconstructing the job operation process based on microoperations, a model of the relationship between parameters and the execution time of whole job was established. Finally, various searching optimization algorithms were applied on this model to efficiently and quickly obtain the optimized system parameters. Experiments were conducted with two types of jobs, terasort and wordcount. The experimental results show that, compared with the default parameters condition, the proposed method reduce the job execution time by at least 41% and 30% respectively. The proposed method can effectively improve the job execution efficiency of Hadoop and shorten the job execution time.

Reference | Related Articles | Metrics

Select

Multidimensional topic model for oriented sentiment analysis based on long short-term memory

TENG Fei, ZHENG Chaomei, LI Wen

Journal of Computer Applications 2016, 36 (8): 2252-2256. DOI: 10.11772/j.issn.1001-9081.2016.08.2252

Abstract （731）

PDF （784KB）（706）

Save

Concerning the low accuracy of global Chinese microblog sentiment classification, a new model was introduced from the perspective of Multi-dimensional Topics based on Long Short-Term Memory (MT-LSTM). The proposed model was constituted by hierarchical multidimensional sequence computation, it was composed of Long Short-Term Memory (LSTM) cell network and suitable for processing vector, array and higher dimensional data. Firstly, microblog was divided into multiple levels for analysis. To upward spread, sentiment tendencies of words and phrases were analyzed by three-Dimensional Long Short-Term Memory (3D-LSTM); to rightward spread, sentiment tendencies of the whole microblog were analyzed by Multi-Dimensional Long Short-Term Memory (MD-LSTM). Secondly, sentiment tendencies were analyzed by Gaussian distribution in topic sign. Finally, the classification result was obtained by weighting above analyses. The experimental results show that the average precision of the proposed model reached 91%, up to 96.5%, and the recall of the neutral microblog reached 50%. In the comparison experiments with Recursive Neural Network (RNN) model, the F-measure of MT-LSTM was enhanced above 40%; compared with no topic division, the F-measure of MT-LSTM was enhanced by 11.9% because of meticulous topic division. The proposed model has good overall performance, it can effectively improve the accuracy of analyzing Chinese microblog sentiment tendencies and reduce the amount of training data and the complexity of matching calculation.

Reference | Related Articles | Metrics

Select

Improved adaptive collaborative filtering algorithm to change of user interest

HU Weijian, TENG Fei, LI Lingfang, WANG Huan

Journal of Computer Applications 2016, 36 (8): 2087-2091. DOI: 10.11772/j.issn.1001-9081.2016.08.2087

Abstract （449）

PDF （767KB）（411）

Save

As a widely used recommendation algorithm in the industry, collaborative filtering algorithm can predict the likely favorite items based on the user's historical behavior records. However, the traditional collaborative filtering algorithms do not take into account the drifting of user interests, and there are also some deficiencies when the recommendation's timeliness is considered. To solve these problems, the measure method of similarity was improved by combining with the characteristics of user interests change with time. At the same time, an enhanced time attenuation model was introduced to measure the predictive value. By combining these two ways together, the concept drifting problem of user interests was solved and the timeliness of the recommendation algorithm was also considered. In the simulation experiment, predictive scoring accuracy and Top N recommendation accuracy were compared among the proposed algorithm, UserCF, TCNCF, PTCF and TimesSVD++ algorithm in different data sets. The experimental results show that the improved algorithm can reduce the Root Mean Square Error (RMSE) of the prediction score, and it is better than all the compared algorithms on the accuracy of Top N recommendation.

Reference | Related Articles | Metrics

Select

Real-time fault-tolerant technology for Hadoop based on heartbeat expired time mechanism

GUAN Guodong, TENG Fei, YANG Yan

Journal of Computer Applications 2015, 35 (10): 2784-2788. DOI: 10.11772/j.issn.1001-9081.2015.10.2784

Abstract （471）

PDF （754KB）（385）

Save

The heartbeat mechanism in Hadoop is not reasonable for short jobs, and ignores the fairness of expired time set of nodes in heterogeneous cluster. In order to overcome the problem, a fair expired time fault-tolerant mechanism was proposed. First of all, a failure misjudgement loss model and a Fair MisJudgment Loss (FMJL) algorithm were put forward according to reliability and computational performance of nodes, so as to meet requirements of the long jobs and short jobs at the same time. Then a fair expired time mechanism based on FMJL algorithm was designed and implemented. Running a 345 seconds short job on the Hadoop with the proposed fair expired time mechanism, the results showed that it saved completion time by 44% when there was fault on TaskTracker nodes, and saved completion time by 23% compared with self-adaptation expired time mechanism. The experimental results show that the proposed fair expired time mechanism shortens the fault-tolerant processing time without affecting the completion time of long jobs, and can improve the efficiency of real-time processing ability for a heterogeneous Hadoop cluster.

Reference | Related Articles | Metrics